Goto

Collaborating Authors

 generalization risk


Conformal Prediction and Trustworthy AI

arXiv.org Artificial Intelligence

Conformal predictors are machine learning algorithms developed in the 1990's by Gammerman, Vovk, and their research team, to provide set predictions with guaranteed confidence level. Over recent years, they have grown in popularity and have become a mainstream methodology for uncertainty quantification in the machine learning community. From its beginning, there was an understanding that they enable reliable machine learning with well-calibrated uncertainty quantification. This makes them extremely beneficial for developing trustworthy AI, a topic that has also risen in interest over the past few years, in both the AI community and society more widely. In this article, we review the potential for conformal prediction to contribute to trustworthy AI beyond its marginal validity property, addressing problems such as generalization risk and AI governance. Experiments and examples are also provided to demonstrate its use as a well-calibrated predictor and for bias identification and mitigation.


Neural network-enhanced integrators for simulating ordinary differential equations

arXiv.org Artificial Intelligence

Numerous applications necessitate the computation of numerical solutions to differential equations across a wide range of initial conditions and system parameters, which feeds the demand for efficient yet accurate numerical integration methods.This study proposes a neural network (NN) enhancement of classical numerical integrators. NNs are trained to learn integration errors, which are then used as additive correction terms in numerical schemes. The performance of these enhanced integrators is compared with well-established methods through numerical studies, with a particular emphasis on computational efficiency. Analytical properties are examined in terms of local errors and backward error analysis. Embedded Runge-Kutta schemes are then employed to develop enhanced integrators that mitigate generalization risk, ensuring that the neural network's evaluation in previously unseen regions of the state space does not destabilize the integrator. It is guaranteed that the enhanced integrators perform at least as well as the desired classical Runge-Kutta schemes. The effectiveness of the proposed approaches is demonstrated through extensive numerical studies using a realistic model of a wind turbine, with parameters derived from the established simulation framework OpenFast.


Structure Regularization for Structured Prediction

Neural Information Processing Systems

While there are many studies on weight regularization, the study on structure regularization is rare. Many existing systems on structured prediction focus on increasing the level of structural dependencies within the model. However, this trend could have been misdirected, because our study suggests that complex structures are actually harmful to generalization ability in structured prediction. To control structure-based overfitting, we propose a structure regularization framework via structure decomposition, which decomposes training samples into mini-samples with simpler structures, deriving a model with better generalization power. We show both theoretically and empirically that structure regularization can effectively control overfitting risk and lead to better accuracy. As a by-product, the proposed method can also substantially accelerate the training speed. The method and the theoretical results can apply to general graphical models with arbitrary structures. Experiments on well-known tasks demonstrate that our method can easily beat the benchmark systems on those highly-competitive tasks, achieving record-breaking accuracies yet with substantially faster training speed.


Structure Regularization for Structured Prediction

Neural Information Processing Systems

While there are many studies on weight regularization, the study on structure regularization is rare. Many existing systems on structured prediction focus on increasing the level of structural dependencies within the model. However, this trend could have been misdirected, because our study suggests that complex structures are actually harmful to generalization ability in structured prediction. To control structure-based overfitting, we propose a structure regularization framework via structure decomposition, which decomposes training samples into mini-samples with simpler structures, deriving a model with better generalization power. We show both theoretically and empirically that structure regularization can effectively control overfitting risk and lead to better accuracy. As a by-product, the proposed method can also substantially accelerate the training speed. The method and the theoretical results can apply to general graphical models with arbitrary structures. Experiments on well-known tasks demonstrate that our method can easily beat the benchmark systems on those highly-competitive tasks, achieving record-breaking accuracies yet with substantially faster training speed.


Topology-aware Robust Optimization for Out-of-distribution Generalization

arXiv.org Artificial Intelligence

Out-of-distribution (OOD) generalization is a challenging machine learning problem yet highly desirable in many high-stake applications. Existing methods suffer from overly pessimistic modeling with low generalization confidence. As generalizing to arbitrary test distributions is impossible, we hypothesize that further structure on the topology of distributions is crucial in developing strong OOD resilience. To this end, we propose topology-aware robust optimization (TRO) that seamlessly integrates distributional topology in a principled optimization framework. More specifically, TRO solves two optimization objectives: (1) Topology Learning which explores data manifold to uncover the distributional topology; (2) Learning on Topology which exploits the topology to constrain robust optimization for tightlybounded generalization risks. We theoretically demonstrate the effectiveness of our approach and empirically show that it significantly outperforms the state of the arts in a wide range of tasks including classification, regression, and semantic segmentation. Moreover, we empirically find the data-driven distributional topology is consistent with domain knowledge, enhancing the explainability of our approach. Recent years have witnessed a surge of applying machine learning (ML) in high-stake and safetycritical applications.


More Than a Toy: Random Matrix Models Predict How Real-World Neural Representations Generalize

arXiv.org Machine Learning

Of theories for why large-scale machine learning models generalize despite being vastly overparameterized, which of their assumptions are needed to capture the qualitative phenomena of generalization in the real world? On one hand, we find that most theoretical analyses fall short of capturing these qualitative phenomena even for kernel regression, when applied to kernels derived from large-scale neural networks (e.g., ResNet-50) and real data (e.g., CIFAR-100). On the other hand, we find that the classical GCV estimator (Craven and Wahba, 1978) accurately predicts generalization risk even in such overparameterized settings. To bolster this empirical finding, we prove that the GCV estimator converges to the generalization risk whenever a local random matrix law holds. Finally, we apply this random matrix theory lens to explain why pretrained representations generalize better as well as what factors govern scaling laws for kernel regression. Our findings suggest that random matrix theory, rather than just being a toy model, may be central to understanding the properties of neural representations in practice.


HierMUD: Hierarchical Multi-task Unsupervised Domain Adaptation between Bridges for Drive-by Damage Diagnosis

arXiv.org Artificial Intelligence

Monitoring bridge health using vibrations of drive-by vehicles has various benefits, such as no need for directly installing and maintaining sensors on the bridge. However, many of the existing drive-by monitoring approaches are based on supervised learning models that require labeled data from every bridge of interest, which is expensive and time-consuming, if not impossible, to obtain. To this end, we introduce a new framework that transfers the model learned from one bridge to diagnose damage in another bridge without any labels from the target bridge. Our framework trains a hierarchical neural network model in an adversarial way to extract task-shared and task-specific features that are informative to multiple diagnostic tasks and invariant across multiple bridges. We evaluate our framework on experimental data collected from 2 bridges and 3 vehicles. We achieve accuracies of 95% for damage detection, 93% for localization, and up to 72% for quantification, which are ~2 times improvements from baseline methods.


Dependency Structure Misspecification in Multi-Source Weak Supervision Models

arXiv.org Artificial Intelligence

Data programming (DP) has proven to be an attractive alternative to costly hand-labeling of data. In DP, users encode domain knowledge into \emph{labeling functions} (LF), heuristics that label a subset of the data noisily and may have complex dependencies. A label model is then fit to the LFs to produce an estimate of the unknown class label. The effects of label model misspecification on test set performance of a downstream classifier are understudied. This presents a serious awareness gap to practitioners, in particular since the dependency structure among LFs is frequently ignored in field applications of DP. We analyse modeling errors due to structure over-specification. We derive novel theoretical bounds on the modeling error and empirically show that this error can be substantial, even when modeling a seemingly sensible structure.


Train simultaneously, generalize better: Stability of gradient-based minimax learners

arXiv.org Machine Learning

The success of minimax learning problems of generative adversarial networks (GANs) has been observed to depend on the minimax optimization algorithm used for their training. This dependence is commonly attributed to the convergence speed and robustness properties of the underlying optimization algorithm. In this paper, we show that the optimization algorithm also plays a key role in the generalization performance of the trained minimax model. To this end, we analyze the generalization properties of standard gradient descent ascent (GDA) and proximal point method (PPM) algorithms through the lens of algorithmic stability under both convex concave and non-convex non-concave minimax settings. While the GDA algorithm is not guaranteed to have a vanishing excess risk in convex concave problems, we show the PPM algorithm enjoys a bounded excess risk in the same setup. For non-convex non-concave problems, we compare the generalization performance of stochastic GDA and GDmax algorithms where the latter fully solves the maximization subproblem at every iteration. Our generalization analysis suggests the superiority of GDA provided that the minimization and maximization subproblems are solved simultaneously with similar learning rates. We discuss several numerical results indicating the role of optimization algorithms in the generalization of the learned minimax models.


An Application of Multiple-Instance Learning to Estimate Generalization Risk

arXiv.org Machine Learning

An Application of Multiple-Instance Learning to Estimate Generalization Risk Daiki Suehiro Kyushu University / RIKEN Abstract We focus on several learning approaches that employ max-operator to evaluate the margin. For example, such approaches are commonly used in multi-class learning task and top-rank learning task. In general, in order to estimate the theoretical generalization risk, we need to individually evaluate the complexity of each hypothesis class used in the learning approaches. In this paper, we provide a technique to estimate a theoretical generalization risk for such learning approaches in a same fashion. The key idea is to "redundantly" reformulate the learning problem as one-class multiple-instance learning by redefining the specific input space based on the original input space. Surprisingly, we succeed to improve the generalization risk bounds for some multi-class learning and top-rank learning algorithms. 1 Introduction A lot of margin-based learning approaches such as SVMs have a strong theoretical generalization risk bound and well works in practice. In this paper, we focus on learning approaches that define a margin based on max-operator.